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We propose a generalized functional linear regression model for 
a regression situation where the response variable is a scalar and 
the predictor is a random function. A linear predictor is obtained by 
forming the scalar product of the predictor function with a smooth 
parameter function, and the expected value of the response is related 
to this linear predictor via a link function. If, in addition, a variance 
function is specified, this leads to a functional estimating equation 
which corresponds to maximizing a functional quasi-likelihood. This 
general approach includes the special cases of the functional linear 
model, as well as functional Poisson regression and functional bino- 
mial regression. The latter leads to procedures for classification and 
discrimination of stochastic processes and functional data. We also 
consider the situation where the link and variance functions are un- 
known and are estimated nonparametrically from the data, using a 
semiparametric quasi-likelihood procedure. 

An essential step in our proposal is dimension reduction by approx- 
imating the predictor processes with a truncated Karhunen-Loeve 
expansion. We develop asymptotic inference for the proposed class of 
generalized regression models. In the proposed asymptotic approach, 
the truncation parameter increases with sample size, and a martin- 
gale central limit theorem is applied to establish the resulting increas- 
ing dimension asymptotics. We establish asymptotic normality for a 
properly scaled distance between estimated and true functions that 
corresponds to a suitable L 2 metric and is defined through a gener- 
alized covariance operator. As a consequence, we obtain asymptotic 
tests and simultaneous confidence bands for the parameter function 
that determines the model. 

The proposed estimation, inference and classification procedures 
and variants with unknown link and variance functions are investi- 
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gated in a simulation study. We find that the practical selection of the 
number of components works well with the AIC criterion, and this 
finding is supported by theoretical considerations. We include an ap- 
plication to the classification of medflies regarding their remaining 
longevity status, based on the observed initial egg-laying curve for 
each of 534 female medflies. 

1. Introduction. Many studies involve tightly spaced repeated measure- 
ments on the same individuals or direct recordings of a sample of curves 
[Brumback and Rice (1998) and Staniswalis and Lee (1998)]. If longitudinal 
measurements are made on a suitably dense grid, such data can often be re- 
garded as a sample of curves or as functional data. Examples can be found 
in studies on longevity and reproduction, where typical subjects are fruit 
flies [Muller et al. (2001)] or nematodes [Wang, Miiller, Capra and Carey 
(1994)]. 

Our procedures are motivated by a study where the goal is to find out 
whether there is information in the egg-laying curve observed for the first 
30 days of life for female medflies, regarding whether the fly is going to be 
long-lived or short-lived. Discrimination and classification of curve data is 
of wide interest, from engineering [Hall, Poskitt and Presnell (2001)], and 
astronomy [Hall, Reimann and Rice (2000)] to DNA expression arrays with 
repeated measurements, where dynamic classification of genes is of interest 
[Alter, Brown and Botstein (2000)]. For multivariate predictors with fixed 
dimension, such discrimination tasks are often addressed by fitting binomial 
regression models using quasi-likelihood based estimating equations. 

Given the importance of discrimination problems for curve data, it is 
clearly of interest to extend notions such as logistic, binomial or Poisson re- 
gression to the case of a functional predictor, which may be often viewed as a 
random predictor process. More generally, there is a need for new models and 
procedures allowing one to regress univariate responses of various types on 
a predictor process. The extension from the classical situation with a finite- 
dimensional predictor vector to the case of an infinite-dimensional predictor 
process involves a distinctly different and more complicated technology. One 
characteristic feature is that the asymptotic analysis involves increasing di- 
mension asymptotics, where one considers a sequence of increasingly larger 
models. 

The functional linear regression model with functional or continuous re- 
sponse has been the focus of various investigations [see Ramsay and Silverman 
(1997), Faraway (1997), Cardot, Ferraty and Sarda (1999) and Fan and Zhang 
(2000)]. An applied version of a generalized linear model with functional 
predictors has been investigated by James (2002). We assume here that the 
dependent variable is univariate and continuous or discrete, for example, 
of binomial or Poisson type, and that the predictor is a random function. 
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The main idea is to employ a Karhunen-Loeve or other orthogonal expan- 
sion of the random predictor function [see, e.g., Ash and Gardner (1975) and 
Castro, Lawton and Sylvestre (1986)], with the aim to reduce the dimension 
to the first few components of such an expansion. The expansion is therefore 
truncated at a finite number of terms which increases asymptotically. 

Once the dimension is reduced to a finite number of components, the ex- 
pansion coefficients of the predictor process determine a finite-dimensional 
vector of random variables. We can then apply the machinery of generalized 
linear or quasi-likelihood models [Wedderburn (1974)], essentially solving 
an estimating or generalized score equation. The resulting regression coeffi- 
cients obtained for the linear predictor in such a model then provide us with 
an estimate of the parameter function of the generalized functional regres- 
sion model. This parameter function replaces the parameter vector of the 
ordinary finite-dimensional generalized linear model. We derive an asymp- 
totic limit result (Theorem 4.1) for the deviation between estimated and 
true parameter function for increasing dimension asymptotics, referring to 
a situation where the number of components in the model increases with 
sample size. 

Asymptotic tests for the regression effect and simultaneous confidence 
bands are obtained as corollaries of this main result. We include an extension 
to the case of a semiparametric quasi-likelihood regression (SPQR) model 
in which link and variance functions are unknown and are estimated from 
the data, extending previous approaches of Chiou and Miiller (1998, 1999), 
and also provide an analysis of the AIC criterion for order selection. 

The paper is organized as follows: The basics of the proposed generalized 
functional linear model and some preliminary considerations can be found in 
Section 2. The underlying ideas of estimation and statistical analysis within 
the generalized functional linear model will be discussed in Section 3. The 
main results and their ramifications are described in Section 4, preceded by 
a discussion of the appropriate metric in which to formulate the asymptotic 
result, which is found to be tied to the link and variance functions used for 
the generalized functional linear model. Simulation results are reported in 
Section 5. An illustrative example for the special case of binomial functional 
regression with the goal to discriminate between short- and long-lived med- 
flies is provided in Section 6. This is followed by the main proofs in Section 7. 
Proofs of auxiliary results are in the Appendix. 

2. The generalized functional linear model. The data we observe for 
the iih subject or experimental unit are ({Xi(t),t S T}, Y{), i = 1, . . . , n. We 
assume that these data form an i.i.d. sample. The predictor variable X(t), 
t € T, is a random curve which is observed per subject or experimental unit 
and corresponds to a square integrable stochastic process on a real interval 
T. The dependent variable Y is a real- valued random variable which may 
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be continuous or discrete. For example, in the important special case of a 
binomial functional regression, one would have Y G {0, 1}. 

Assume that a link function g(-) is given which is a monotone and twice 
continuously differentiable function with bounded derivatives and is thus 
invertible. Furthermore, we have a variance function <r 2 (-) which is defined 
on the range of the link function and is strictly positive. The generalized 
functional linear model or functional quasi-likelihood model is determined 
by a parameter function which is assumed to be square integrable on 
its domain T, in addition to the link function g(-) and the variance function 
a 2 (-). 

Given a real measure dw on T, define linear predictors 

v = a + J (3(t)X(t)dw(t) 

and conditional means /i = g(rj), where E(Y\X(t),t £T) = /i and \~ar(Y\X(t), t G 
T) = er 2 (/i) = a 2 (r]) for a function d 2 {rf) = a 2 (g(rj)). In a generalized func- 
tional linear model the distribution of Y would be specified within the ex- 
ponential family. For the following (except where explicitly noted), it will 
be sufficient to consider the functional quasi-likelihood model 

(1) Yi = g(a + J 0(t)Xi(t)dw(t)j +e u i = l,...,n, 

where 

B(e\X(t),t£T) = 0, 
Var(e\X(t),t G T) = ct 2 (/j) = a 2 (rj). 

Note that a is a constant, and the inclusion of an intercept allows us to 
require B(X(t))=0 for all t. 

The errors are i.i.d. and we use integration w.r.t. the measure dw(t) 
to allow for nonnegative weight functions v (•) such that v (t) > for t G T, 
v(t) = for t ^ T and dw(t) = v(t) dt; the default choice will be v(t) = l^teT} ■ 
Nonconstant weight functions might be of interest when the observed predic- 
tor processes are function estimates which may exhibit increased variability 
in some regions, for example, toward the boundaries. 

The parameter function (3{-) is a quantity of central interest in the statis- 
tical analysis and replaces the vector of slopes in a generalized linear model 
or estimating equation based model. Setting a 2 = E{<7 2 (t/)}, we then find 

Var(e) = Var{E(e|X(t), t G T} + E{Var(e|X(t), t G T} 

= E{a 2 (r ] )}=a 2 , 

as well as E(e) = 0. 
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Let pj, j = 1,2,..., be an orthonormal basis of the function space L 2 (dw), 
that is, j r pj(t)pk(t) dw(t) = 5jk- Then the predictor process X(t) and the 
parameter function (3(t) can be expanded into 

oo oo 

*(*) = X>iPi(t), P(t) = J2PjPj(t) 

3=1 3=1 

[in the L 2 {dw) sense] with r.v.'s Ej and coefficients j3j, given by Ej = J X(t) x 
Pj(t)dw(t) and f3j = J 0(t)pj(t) dw(t), respectively. We note that E(e,') = 
and £ /?? < oo. Writing a) = E(e]), we find £of = / E (^ 2 W) dw(t) < oo. 

From the orthonormality of the base functions pj , it follows immediately 
that 



(5{t)X{t)dw(t)=Y^^ 



Ei. 



3=1 

It will be convenient to work with standardized errors 

e' = ea{p) = ea{rj), 

for which E(e'|X) = 0, E(e') = 0, E(e' 2 ) = 1. We assume that E(e' 4 ) = p 4 < 
oo and note that in model (1), the distribution of the errors e% does not 
need to be specified and, in particular, does not need to be a member of 
the exponential family. In this regard, model (1) is less an extension of the 
classical generalized linear model [McCullagh and Nelder (1989)] than an 
extension of the quasi- likelihood approach of Wedderburn (1974). We ad- 
dress the difficulty caused by the infinite dimensionality of the predictors by 
approximating model (1) with a series of models where the number of pre- 
dictors is truncated at p = p n and the dimension p n increases asymptotically 
as n — > oo. 

A heuristic motivation for this truncation strategy is as follows: Setting 

p oo 

Up = a + ^2fij£j, V p = J2 ^ e j> 

3=1 3=P+1 

we find E(Y\X(t),t G T) = g(a + Y,jLi Pj£j) = g{U p + V p ). Conditioning on 
the first p components and writing Fy \ jj for the conditional distribution 
function leads to a truncated link function g p , 

E(Y\U p )=g p (U p )=E[g(U p + V p )\U p }= J g(U p + s) dF VplUp (s). 

For the approximation of the full model by the truncated link function, we 
note that the boundedness of g' , | </(•)! 2 < c, implies that 

/ [g(U P + V p ) - g(U p + s)} dF VplUp (s)" ' 
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g'(0\V p -s) 2 dF VplUp (s) 
<2cJ(V 2 + s 2 )dF Vp]Up (s) 



and, therefore, 



E((g(U p + V p )-g p (U p )) 2 ) 



(2) 



= E (| [g(U p + V p ) - g(U p + s)] dF Vp{Up (s) 
< 2cE(V 2 + E(V 2 \U P )) = AcE(V 2 ) 



oo oo 

< 4c E Pi E -1- 

The approximation error of the truncated model is seen to be directly tied to 
\a.r{V p ) and is controlled by the sequence a 2 = Var(ej), j = 1, 2, . . . , which 
for the special case of an eigenbase corresponds to a sequence of eigenvalues. 

Setting £j = J Xi(t)pj(t) dw(t), the full model with standardized errors 
e'i is 

Y % = g (a + £ Pjsf^j + e'fi (a + £ faef^ , i = l,...,n. 

With truncated linear predictors r\ and means fi, 

v 

rn = a + J2(3j£j , Vi = 9{m), 
i=i 

the p-truncated model becomes 

(3) =g p [a + £ faf J + (« + E J ' * = 1, • • ■ , «, 

where <7 P is defined analogously to <? p . Note that g(U p ) — g p {U p ) and, analo- 
gously, cr{U p ) — & P {U P ) are bounded by the error (2). Since it will be assumed 
that this error vanishes asymptotically, as p — ► oo, we may instead of (3) work 
with the approximating sequence of models 

(4) y/ p) =g(a + p^ j ef^j + e<c^a + £/%efj , i = l,...,n, 

in which the functions g and a are fixed. We note that the random variables 
Yf* and e[, i = l,...,n, form triangular arrays, 3^ and e' in , i = 1, . . . , n, 
with changing distribution as n changes; for simplicity, we suppress the 
indices n. 
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Inference will be developed for the sequence of p-truncated models (4) 
with asymptotic results for p — > oo. The practical choice of p in finite sample 
situations will be discussed in Section 5. We also develop a version where 
the link function g is estimated from the data, given p. The practical imple- 
mentation of this semiparametric quasi-likelihood regression (SPQR) version 
adapts to the changing link functions g p of the approximating sequence (3). 

3. Estimation in the generalized functional linear model. One central 
aim is estimation and inference for the parameter function /?(•). Inference 
for /?(•) is of interest for constructing confidence regions and testing whether 
the predictor function has any influence on the outcome, in analogy to the 
test for regression effect in a classical regression model. The orthonormal 
basis {pj,j = 1,2,...} is commonly chosen as the Fourier basis or the basis 
formed by the eigenfunctions of the covariance operator. The eigenfunctions 
can be estimated from the data as described in Rice and Silverman (1991) 
or Capra and Miiller (1997). Whenever estimation and inference for the in- 
tercept a is to be included, we change the summation range for the linear 
predictors rji on the right-hand side of the p-truncated model (3) to Ylo 

from YXi setting 4 = anc ^ (fa = &■ I n the following, inclusion of a into 
the parameter vector will be the default. 

Fixing p for the moment, we are in the situation of the usual estimating 
equation approach and can estimate the unknown parameter vector f] T = 
(Po, . . . ,/3p) by solving the estimating or score equation 

(5) U((3) = 0. 

Setting eW T = (e^ , . . . ,4 ), m = EjU/^j > Pi = diVi), i = 1, ■ • • ,rt, the 
vector- valued score function is defined by 

n 

(6) f7 (/3 ) = £( Fi _ mWim)*® /^(jh). 

i=l 

The solutions of the score equation (5) will be denoted by 

(7) /F = (/3 ,...,/3 p ); a = (3 - 

Relevant matrices which play a well-known role in solving the estimating 
equation (5) are 

D = D n . p = (g'(vi)4 ^/ (J (^))i<i<n,o<fc<p' 



V = V n , P = diag (ct 2 (^i), . . . ,cr 2 (^ n )) 1 < i < 



<s 
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a" 2 (v) 

r = T p = (7fcz)o<fc,/< P ) Ikl = E I „ £fc£; 



and with generic copies r/,£,/x of rn,£^' , fa, respectively, 

g //2 (r?) 

^(/i) 

(8) 

^ = r = {^ki)o<k,i< P - 

We note that T = -E(D T D) is a symmetric and positive definite matrix and 
that the inverse matrix 3 exists. Otherwise, one would arrive at the contra- 
diction E((J2k=o a k £ k9' X 7 ?) / a (l J '))) 2 ) = f° r nonzero constants ag, • • • ,a p . 

With vectors Y T = (Y]_, . . . , Y n ), [i T = (fix, . . . , fi n ), the estimating equa- 
tion U (/?) = can be rewritten as 

D T V~ 1/2 (Y -n) = 0. 

This equation is usually solved iteratively by the method of iterated weighted 
least squares. Under our basic assumptions, as -~E(D T D) = T p is a fixed 
positive definite matrix for each p, the existence of a unique solution for 
each fixed p is assured asymptotically. 

In the above developments we have assumed that both the link func- 
tion g(-) and the variance function cr 2 (-) are known. Situations where the 
link and variance functions are unknown are common, and we can extend 
our methods to cover the general case where these functions are smooth, 
which for fixed p corresponds to the semiparametric quasi-likelihood regres- 
sion (SPQR) models considered in Chiou and Miiller (1998, 1999). In the 
implementation of SPQR one alternates nonparametric (smoothing) and 
parametric updating steps, using a reasonable parametric model for the ini- 
tialization step. Since the link function is arbitrary, except for smoothness 
and monotonicity constraints, we may require that estimates and parameters 
satisfy \\f3\\ = 1, = 1 for identifiability. 

For given (3, ||/3|| = 1, setting fji = Y^=oPj £ ^\ updates of the link function 
estimate g(-) and its first derivative g'(-) are obtained by smoothing (apply- 
ing any reasonable scatterplot smoothing method that allows the estimation 
of derivatives) the scatterplot (fji, li)i=i,...,n- Updates for the variance func- 
tion estimate <r 2 (-) are obtained by smoothing the scatterplot (/tj, ef )i=i,..., n , 
where fii = gij]i) are current mean response estimates and if = {Y% — Ai) 2 are 
current squared residuals. The parametric updating step then proceeds by 
solving the score equation (5), using the semiparametric score 

n 

(9) u(j3) = J2( Y i- g{m))9'{m)e {i) /& 2 {9{m)). 



This leads to the solutions ft, in analogy to (7). For solutions of the score 
equations for both scores (6) and (9), we then obtain the regression function 
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estimates 

(10) /3(i) = A) + Ete(i)- 

Matrices D and V are modified analogously for the SPQR case, substituting 
appropriate estimates. 

4. Asymptotic inference. Given an L 2 -integrable integral kernel function 
R(s, t) : T 2 — > M, define the linear integral operator A R : L 2 (dw) — ► L 2 (dw) on 
the Hilbert space L 2 (dw) for / G L 2 (dw) by 

(11) (A R f)(t) = J f(s)R(s,t)dw(s). 

Operators are compact self-adjoint Hilbert-Schmidt operators if 

oo, 

and can then be diagonalized [Conway (1990), page 47]. 

Integral operators of special interest are the autocovariance operator Ak 
of X with kernel 

(12) K{s,t)=cov(X(s),X(t)) =E(X(s)X(t)) 
and the generalized autocovariance operator Aq with kernel 

(13) G(s,t)=E(^-X(s)X(t)y 

Hilbert-Schmidt operators Ar generate a metric in L 2 , 
d 2 R (f,g)= J (f(t)-g(t))(A R (f-g))(t)dw(t) 

= JJ (f(s)-g(s))(f(t)-g(t))R(s,t)dw(s)dw(t) 

for f,g€z L 2 (dw), and given an arbitrary orthonormal basis {pj,j = 1,2,.. .}, 
the Hilbert-Schmidt kernels R can be expressed as 

R(s,t) = ^2r k iPk(s)pi(t) 
k,i 

for suitable coefficients {rki, k,l = 1,2, . . .} [Dunford and Schwartz (1963), 
page 1009] . Using for any given function h £ L 2 the notation 

h p j = / h(s)pj(s) dw(s) 
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and denoting the normalized eigenfunctions and eigenvalues of the operator 
by {pf, ^f,j = 1,2,.. .}, the distance djt can be expressed as 

d 2 R(f,g) = ^r kl {f pM - g p ,k)(fp,l ~ 9 P ,i) 
k,l 

(14) 

= E ^k (fp R ,k - 9pR,k) 
k 



•r2 



In the following we use the metric do, since it allows us to derive asymp- 
totic limits under considerably simpler conditions than for the L 2 metric, 
due to its dampening effect on higher order frequencies. For the sequence of 
p n -truncated models (1) that we are considering, 

4GM= / / 0(s)-P(s))0(t)-P(t))B(^^X(s)X(t))dw(s)dw(t) 



is approximated by d 2 G ($, j3) = (fi — f3) T T(f3 — (3) for each p. 

In addition to the basic assumptions in Section 2 and usual conditions 
on variance and link functions, we require some technical conditions which 
restrict the growth of p = p n and the higher-order moments of the random 
coefficients ea. Additional conditions are required for the semiparametric 
(SPQR) case where both link and variance functions are assumed unknown 
and are estimated nonparametrically. 

(Ml) The link function g is monotone, invertible and has two continuous 

bounded derivatives with ||</(-)|| < c, || </'(•) II < c for a constant c > 0. 

The variance function cr 2 (-) has a continuous bounded derivative and 

there exists a 5 > such that cr(-) > 5. 
(M2) The number of predictor terms p n in the sequence of approximating 

p n -truncated models (1) satisfies p n — > oo and p n ra -1 / 4 — ► as n — > oo. 
(M3) It holds that [see (8), where the £ki are defined] 

E E ( g fci g fc2 £ fc 3 £ fc4 4 / \ )ik 1 k 2 ik 3 k 4 , = o(n/p n ). 

fci,...,fc 4 =0 v 

(M4) It holds that 

( g ,A (jj) \ 22 

X E ^ ^,4 7^ £ fc 2 £ fc 4 £ fc 6 £ fc 8 J £fci fc 2 Cfe 3 fe4 ^fcsfce Cfcrfes = °( n Pn)- 



fei,...,fcs=0 



We are now in a position to state the central asymptotic result. Given p - 

%) 



p n , denote by (3 = (/3q, • • • , P P ) T the solution of the estimating equations (5), 
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(6) and by (3 = (/?o, • ■ • , (3 P ) T the intercept a = (5q and the first p coefficients 
of the expansion of the parameter function (5{t) = Y^jLi@jPj{t) m the basis 

{pj,j>n- 

Theorem 4.1. If the basic assumptions and (M1)-(M4) are satisfied, 
then 

(15) >N{0,1) tun-too. 

y / 2(p n + 1) 



We note that the matrix T Pn in Theorem 4.1 may be replaced by the em- 
pirical version T = -(DD T ); this is a consequence of (21), (22) and Lemma 
7.2 below. Whenever only the "slope" parameters /3i,/?2, • ■ • but not the in- 
tercept parameter a = (3q are of interest, p n is replaced by p n — 1 and the 
(j) + 1) x (p + 1) matrix T is replaced by the p x p submatrix of T obtained 
by deleting the first row/column. 

To study the convergence of the estimated parameter function $(•), we 
use the distance da and the representation (14) with R = G, coupled with 
the expansion 

p(t) = T,P P o P f(t) 

of the estimated parameter function [){■) in the basis {pf,j = 1,2,.. .}, the 
eigenbasis of operator Aq with associated eigenvalues A G . We obtain 

4(/3(0,/3(-)) = / / 0(s)-P(s))G(s,t)0(t)-P(t))Ms)dw(t) 

V oo 

= Ea?(/%-%) 2 + E 

j=i i=p+i 

oo 

= (/3 G -/? G fr G (/3 G -/3 G ) + £ A?/%. 

Here 

/3 G = (/3 pf , . . . J p cf, f3 G = ((3 p? ,. . .,{3 p cf, 

and the diagonal matrix T is obtained by replacing in the definition of the 
matrix T [see (8)] the Sj by e G that are given by 

ef = 9 ^-Jx(t)pf(t)dw(t), 
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with the property 



(16) E(##) =JJ G(s,t)pf(s)p%(t)dw(s)dw(t) = 5 ij \f. 

These considerations lead under appropriate moment conditions to the fol- 
lowing: 



Corollary 4.1. // the parameter function /?(•) has the property that 

OO i- „ 

17) j2 E ( £ f)[J mpfvMt) 



'Pn 



j=p+l 
then 

nJJ0(s) - (3(s))0(t) ~ f3(t))G(s, t) dw(s) dw(t) - {p n + 1) 



iV(0,l) 



V2pn + 1 

as n — > oo. 

We note that property (17) relates to the rate at which higher-order os- 
cillations, relative to the oscillations of processes X(t), contribute to the L 2 
norm of the parameter function /?(•). 

In the case of unknown link and variance functions (SPQR), one applies 
scatterplot smoothing to obtain nonparametric estimates of functions and 
derivatives and then obtains the parameter estimates (3 as solutions of the 
semiparametric score equation (9). After iteration, final nonparametric esti- 
mates of the link function g, its derivative g' and of the variance function a 2 
are obtained. We implement these nonparametric curve estimators with local 
linear or quadratic kernel smoothers, using a bandwidth h in the smoothing 
step. For the following result we assume these conditions: 

(Rl) The regularity conditions (M1)-(M6) and (K1)-(K3) of Chiou and 

Miiller (1998) hold uniformly for all p n . 
(R2) For the bandwidths h of the nonparametric function estimates for 

link and variance function, h — > 0, — > oo and II i?, ;, r~ 1//2 || — > as 

n — ► oo. 

The following result refers to the matrix 
(18) 1 - {lklh<k,l< Pn , lk,l = ~ 2^ \ Jf2(^ £ki£u J ■ 

Corollary 4.2. Assume (Rl) and (R2) and replace the matrix T in 
(15) by the matrix V from (18). Then (15) remains valid for the semipara- 
metric quasi-likelihood (SPQR) estimates (5 that are obtained as solutions of 
the semiparametric estimating equation (9), substituting nonparametrically 
estimated link and variance functions. 
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Extending the arguments used in the proofs of Theorems 1 and 2 in 
Chiou and Miiller (1998), and assuming additional regularity conditions as 
described there, we find for these nonparametric function estimates, 



sup 

t 



Assuming that h — > 0, — ► oo and II J^.^ T 1//2 || — > 0, we obtain from the 

& ' logn H ^fnh i n ' 

boundedness of the design density of the linear predictors away from and 
oo that 

where the o p -terms are uniform in p following (M2). Therefore, the matrix 
f approximates the elements of the matrix 

f = I ( ™r,_^, .... 
n 



\ DD ) = {ikih<k,i<p n , 7k,i = -2^( j^^ £ ki£ii j 



uniformly in k,l and p n . This, together with the remarks after Theorem 4.1, 
justifies the extension to the semiparametric (SPQR) case with unknown link 
and variance functions. This case will be included in the following, unless 
noted otherwise. 

A common problem of inference in regression models is testing for no 
regression effect, that is, Hq : (3 = const , which is a special case of testing 
for Hq :(5 = (5q for a given regression parameter function (3q . With the rep- 
resentation (3o(t) = Y?,PojPj(t)i t ne null hypothesis becomes Hq-.^j = floj, 
j = 0,1,2,..., and Hq is rejected when the test statistic in Theorem 4.1 
exceeds the critical value $(1 — a), for the case of a fully specified link func- 
tion. Through a judicious choice of the orthonormal basis {pj,j = 1,2,.. .}, 
these tests also include null hypotheses of the type Hq : / f3(t)hj(t) dw{t) = tj, 
j = 1,2,..., for a sequence of linearly independent functions hj\ these are 
transformed into an orthonormal basis by Gram-Schmidt orthonormaliza- 
tion, whence it is easy to see that these null hypotheses translate into 
Hq : f3j = Tj, j = 1,2, ... , for suitable rj if we use the new orthonormal basis 
in lieu of the {pj, j > 1}. For alternative approaches to testing in functional 
regression, we refer to Fan and Lin (1998). 

Another application of practical interest is the construction of confidence 
bands for the unknown regression parameter function (3. In a finite sample 
situation for which p = p n is given and estimates (3 for p- vectors (3 have 
been determined, an asymptotic (1 — a) confidence region for (3 according 
to Theorem 4.1 is given by - P) T T0 -(3)< c(q), where c{a) = \p + 1 + 
\]2{p + l)^! — a)]/n, and T may be replaced by its empirical counterparts 
r or r. More precisely, we have the following: 
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Corollary 4.3. Denote the eigenvectors/eigenvalues of the matrix T 
[see (8)] by (ei, Ai), . . . , (e p+ i, Xp+i), and let 

p+i 

ek = (efci, • • • ,e k , p+ i) T , w k (t) = ^ pi(t)e k i, k = l,...,p + l. 

l=i 

Then, for large n and p n , an approximate (1 — a) simultaneous confidence 
band is given by 



(19) 0{t) ± 

A practical simultaneous band is obtained by substituting estimates for 
Lu k and Afc that result from empirical matrices r or T instead of T. 

5. Simulation study and model selection. 

5.1. Model order selection. An auxiliary parameter of importance in the 
estimation procedure is the number p of eigenfunctions that are used in 
fitting the function (3(t). This number has to be chosen by the statistician. 
An appealing method is the Akaike information criterion (AIC), due to its 
affinity to increasing model orders, and, in addition, we found AIC to work 
well in practice. We discuss here the consistency of AIC for choosing p in 
the context of the generalized linear model with full likelihood and known 
link function. 

Assume the linear predictor vector rj p consists of n components rf Pt i = 
J2*j=o s )Pji i = 1, ■ ■ ■ ,n, the vector fj p of the components fj Pj i = Y?j=o el jfij 
and the vector r\ of the components J2'j^o £ )Pj- Let G be the antiderivative 
of the (inverse) link function g so that Y has the density (in canonical form) 
fy(y) = ex P(y ? ? + a {y) — G(v))- I n particular, a 2 (r]) = g'(r]). The deviance is 

V = -2£ n (Y,f) p ) + 2e n {Y,g- 1 (Y)), 

with log-likelihood 

n n 

i n (Y, fj p ) = Yifj itP - Y G(Vi, P )- 

i=l i=l 

Taylor expansion yields 

-2£ n (Y,fi P ) = -2£ n (Y,rj p ) 

+ 2 (Vp p l n (Y,fi P )f([3 p -P p ) 



p+i 

k=i 
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where the second term on the right-hand side is zero, due to the score 
equation, and the matrix in the quadratic form is essentially (D T D). It 
follows from the proof of Theorem 4.1 that the quadratic form n{(5 p — 

P P ) T \ D n D )((3p — Pp) has asymptotic expectation p. Since 

n 

-2l n (Y, Vp ) = -2£ n (Y, V ) - 2]T (Yt - g(vMVi, P ~ Vi) 

i=l 

n 

+ ^2g'(ili)(rii,p-Vi) 2 , 

i=l 

we arrive at 

E{V)=n Y, ^'(^lAA-ptl + oWl+K 
k,i=p+i 

= n E E(CjrT£kei)(3 k Pi-p(.l + o(l))+E n , 

where E n is an expression that does not depend on p. 

Applying the law of large numbers, and similar considerations as in Sec- 
tion 7, we find V/E(V)Al, as long as p is chosen in (j>o> era 1 / 4 ). Next, 
applying results of Section 7, 

d0(-),{3(-))= J J 0(s)-P(s))G(s,t)0(t)-P(t))dw(s)dw(t) 

oo 

= p -(5 p ) T TCPp-Pp)+ £ lhk P 3 Pk 

k,j=p+i 

p oo 

+ 2E E vAPj-Pi)Pk, 

j=l k=p+l 

where 7JM = Eif^e^). We obtain E(d0(-), /?(•))) = p/n{l + o{l)) + T J ^ j=p+lljyk P J P k (l + 
o(l)). 

This analysis shows that the target function d(/3(-), /?(•)) to be minimized 
is asymptotically close to E(T>/n) + 2p/n. This suggests that we are in the 
situation considered by Shibata (1981) for sequences of linear models with 
normal residuals and by Shao (1997) for the more general case. While the 
closeness of the target function and AIC is suggestive, a rigorous proof that 
the order pa selected by AIC and the order pd that minimizes the target 
function satisfy Pd/pA — ► 1 m probability as n — > oo or a stronger consistency 
or efficiency result requires additional analysis that is not provided here. 
One difficulty is that the usual normality assumption is not satisfied as one 
operates in an exponential family or quasi-likelihood setting. 
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In practice, we implement AIC and the alternative Bayesian information 
criterion BIC by obtaining first the deviance or quasi-deviance T>(p), depen- 
dent on the model order p. This is straightforward in the quasi-likelihood or 
maximum likelihood case with known link function, and requires integrating 
the score function to obtain the analogue of the log-likelihood in the SPQR 
case with unknown link function. Once the deviance is obtained, we choose 
the minimizing argument of 

(20) C(p)=V(p)+V(p), 

where V is the penalty term, chosen as 2p for the AIC and as plogn for the 
BIC. 

Several alternative selectors that we studied were found to be less stable 
and more computer intensive in simulations. These included minimization 
of the leave-one-out prediction error, of the leave-one-out misclassification 
rate via cross-validation [Rice and Silverman (1991)], and of the relative 
difference between the Pearson criterion and the deviance [Chiou and Miiller 
(1998)]. 

5.2. Monte Carlo study. Besides choosing the number p of components 
to include, an implementation of the proposed generalized functional linear 
model also requires choice of a suitable orthonormal basis {pj,j = 1, 2, . . . }. 
Essentially one has two options, using a fixed standard basis such as the 
Fourier basis pj = ipj = V2sm(irjt), t G [0, 1], j > 1, or, alternatively, to es- 
timate the eigenfunctions of the covariance operator Ak (11), (12) from 
the data, with the goal of achieving a sparse representation. We imple- 
mented this second option following an algorithm for the estimation of eigen- 
functions which is described in detail in Capra and Miiller (1997); see also 
Rice and Silverman (1991). Once the number of model components p has 
been determined, the ith observed process is reduced to the p predictors 
= J Xi(t)pj(t) dw(t), j = 1, . . . ,p. We substitute the estimated eigenfunc- 
tions for the pj and evaluate the integrals numerically. 

Once we have reduced the infinite-dimensional model (1) to its p-truncated 
approximation (3), we are in the realm of finite-dimensional generalized lin- 
ear and quasi-likelihood models. The parameters a and (3i,...,(3 p in the 
p-truncated generalized functional model are estimated by solving the re- 
spective score equation. We adopted the weighted iterated least squares al- 
gorithm which is described in McCullagh and Nelder (1989) for the case of 
a generalized linear or quasi-likelihood model with known link function, and 
the QLUE algorithm described in Chiou and Miiller (1998) for the SPQR 
model with unknown link function. 

The purpose of our Monte Carlo study was to compare AIC and BIC 
as selection criteria for the order p, to study the power of statistical tests 
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for regression effect in a generalized functional regression model and, fi- 
nally, to investigate the behavior of the semiparametric SPQR procedure 
for functional regression, in comparison to the maximum or quasi-likelihood 
implementation with a fully specified link function. The design was as fol- 
lows: Pseudo-random processes based on the first 20 functions from the 
Fourier base X(t) = J2"j=i £ j <Pj(t) were generated by using normal pseudo- 
random variables £j ~ N(0, 1/j 2 ), j > 1. Choosing f3j = 1/j, 1 < j < 3, 
ft, = 1, 0j = 0, j > 3, we defined 0{t) = £ 2 °i/^(*) and p(X(-)) = g(f3 + 
Y^j=i /3j£j), choosing logit link [withp(x) =exp(x)/(l + exp(a;))] and c-loglog 
link [with g(x) = exp(— exp(— x))]. Then we generated responses Y(X) ~ 
Binomial (p(X), 1) as pseudo-Bernoulli r.v.s with probability p(X), obtain- 
ing a sample (Xi(t),Y{), i = 1, . . . , n. Estimation methods included general- 
ized functional linear modeling with logit, c-loglog and unspecified (SPQR) 
link functions. 

In results not shown here, a first finding was that the AIC performed 
somewhat better than BIC overall, in line with theoretical expectations, 
and, therefore, we used AIC in the data applications. To demonstrate the 
asymptotic results, in particular, Theorem 4.1, we obtained empirical power 
functions for data generated and analyzed with the logit link, using the 
test statistic T on the left-hand side of (16) to test the null hypothesis of 

no regression effect Hq : (3j = 0, j = 1,2, This test was implemented as a 

one-sided test at the 5% level, that is, rejection was recorded whenever \T\ > 
$ _1 (0.95). The average rejection rate was determined over 500 Monte Carlo 
runs, for sample sizes n = 50,200, as a function of 8, < 8 < 2, where the 
underlying parameter vector was as described in the preceding paragraph, 
multiplied by 8, and is given by (5, 8,8/2, 8/3). The resulting power functions 
are shown in Figure 1 and demonstrate that sample size plays a critical role. 

To demonstrate the usefulness of the SPQR approach with automatic link 
estimation, we calculated the means of the estimated regression parameter 
functions /?(•) over 50 Monte Carlo runs for the following cases: In each run, 
1000 samples were generated with either the logit or c-loglog link function 
and the corresponding functions /?(•) were estimated in three different ways: 
Assuming a logit link, a c-loglog link and assuming no link, using the SPQR 
method. The resulting mean function estimates can be seen in Figure 2. One 
finds that misspecification of the link function can lead to serious problems 
with these estimates and that the flexibility of the SPQR approach entails 
a clear advantage over methods where a link function must be specified a 
priori. 

6. Application to medfly data and classification. It is a long-standing 
problem in evolution and ecology to analyze the interplay of longevity and re- 
production. On one hand, longevity is a prerequisite for reproduction; on the 
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other hand, numerous articles have been written about a "cost of reproduc- 
tion," which is the concept that a high degree of reproduction inflicts a dam- 
age on the organism and shortens its lifespan [see, e.g., Partridge and Harvey 
(1985)]. The precise nature of this cost of reproduction remains elusive. 

Studies with Mediterranean fruit flies ( Ceratitis capitata) , or medflies for 
short, have been of considerable interest in pursuing these questions as hun- 
dreds of flies can be reared simultaneously and their daily reproduction 
activity can be observed by simply counting the daily eggs laid by each in- 
dividual fly, in addition to recording its lifetime [Carey et al. (1998a, b)]. 
For each medfly, one may thus obtain a reproductive trajectory and one 
can then ask the operational question whether particular features of this 
random curve have an impact on subsequent mortality [see Miiller et al. 
(2001) for a parametric approach and Chiou, Miiller and Wang (2003) for a 
functional model, where the egg-laying trajectories are viewed as response]. 
In the present framework we cast this as the problem to predict whether a 
fly is short- or long-lived after an initial period of egg-laying is observed. We 

1 1 1 r 1 1 1 1 




/ 



1 1 1 1 1 1 1 1 1 1 J 

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

S 

Fig. 1. Empirical power functions for the significance test for a functional logistic re- 
gression effect at the 5% level. Based on 500 simulations, for sample sizes 50 (dashed) 
and 200 (solid ), with p = 3. 
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Fig. 2. Average estimates of the regression parameter function f3{-) obtained over 50 
Monte Carlo runs from data generated either with the logit link ('left panel) or with 
the c-loglog link ('right panel,). Each panel displays the target function (solid), and es- 
timates obtained assuming the logit link (dashed), the c-loglog link (dash-dot) and the 
SPQR method incorporating nonparametric link function estimation (dotted). 

adopt a functional binomial regression model where the initial egg-laying 
trajectory is the predictor process and the subsequent longevity status of 
the fly is the response. Of particular interest is the shape of the parame- 
ter function /?(•), as it provides an indication as to which features of the 
egg-laying process are associated with the longevity of a fly. 

From the one thousand medflies described in Carey et al. (1998a), we 
select flies which lived past 34 days, providing us with a sample of 534 med- 
flies. For prediction, we use egg-laying trajectories from to 30 days, slightly 
smoothed to obtain the predictor processes Xi (t),t£ [0,30], i = 1, ... , 534. A 
fly is classified as long-lived if the remaining lifetime past 30 days is 14 days 
or longer, otherwise as short-lived. Of the n = 534 flies, 256 were short-lived 
and 278 were long-lived. We apply the algorithm as described in the previous 
section, choosing the logit link, fitting a logistic functional regression. 

Plotting the reproductive trajectories for the long-lived and short-lived 
flies separately (upper panels of Figure 3), no clear visual differences be- 
tween the two groups can be discerned. Failure to visually detect differences 
between the two groups could result from overcrowding of these plots with 
too many curves, but when displaying fewer curves (lower panels of Fig- 
ure 3), this remains the same. Therefore, the discrimination task at hand is 
difficult, as at best subtle and hard to discern differences exist between the 
trajectories of the two groups. 

We use the Akaike information criterion (AIC) for choosing the number of 
model components. As can be seen from Figure 4, where the AIC criterion is 
shown in dependency on the model order p, this leads to the choice p = 6. The 
cross-validation prediction error criterion PE = ^YaLiO^ ~ P \ ^) 2 > where 
p\ ^ is the leave-one-out estimate for pi, supports a similar choice. The 



20 



H.-G. MULLER AND U. STADTMULLER 



leave-out misclassification rate estimates are, for the group of long-lived 
flies, 37% with logit link and 35% for the nonparametric SPQR link, while 
for the group of short-lived flies these are 47% for logit and 48% for SPQR, 
demonstrating the difficulty of classifying short-lived flies correctly. 

The fitted regression parameter functions /?(•) for both logistic (logit link) 
and SPQR (nonparametric link) functional regression, along with simultane- 
ous confidence bands (19), are shown in Figure 5; we find that the estimate 
with nonparametric link is quite close to the estimate employing the logistic 
link, thus providing some support for the choice of the logistic link in this 
case. The asymptotic confidence bands allow us to conclude that the link 
function has a steep rise at the right end towards age 30 days, and that the 
null hypothesis of no effect would be rejected. 

The shape of the parameter function /?(•) highlights periods of egg-laying 
that are associated with increased longevity. We note that under the logit 




1 10 20 30 1 10 20 30 

Time(days) Tirne(days) 




Time(days) Time(days) 

FlG. 3. Predictor trajectories, corresponding to slightly smoothed daily egg-laying curves, 
for n = 534 medflies. The reproductive trajectories for 256 short-lived medflies are in the 
upper left and those for 278 long-lived medflies in the upper right panel. Randomly selected 
profiles from the panels above are shown in the lower panels for 50 medflies. 
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Fig. 4. Akaike information criterion (AIC ') as a function of the number of model com- 
ponents p for the medfly data. 



link function, the predicted classification probability for a long-lived fly is 
g(rj) = exp(7y)/(l +exp(?7)). Overlaid with this expit-function, the nonpara- 
metric link function estimate that is employed in SPQR is shown in Figure 6 
(choosing local linear smoothing and the bandwidth 0.55 for the smoothing 
steps), along with the corresponding indicator data from the last iteration 
step. For both links, larger linear predictors 77, and therefore larger values 
of the parameter regression function /3(-), are seen to be associated with an 
increased chance for longevity. 

Since the parameter function is relatively large between about 12-17 days 
and past 26 days, we conclude that heavy reproductive activity during these 
periods is associated with increased longevity. In contrast, increased repro- 
duction between 8-12 days and 20-26 days is associated with decreased 
longevity. A high level of late reproduction emerges as a significant and 
overall as the strongest indicator of longevity in our analysis. This is of 
biological significance since it implies that increased late reproduction is 
associated with increased longevity and may have a protective effect. In- 
creased reproduction during the peak egg-laying period around 10 days has 
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Fig. 5. The regression parameter function estimates /3(-) (19) (solid) for the medfly 
classification problem, with simultaneous confidence bands (5) (dashed ). Left panel: Logit 
link. Right panel: Nonparametric link, using the SPQR algorithm. 



previously been associated with a cost of reproduction, an association that 
is supported by our analysis. 

7. Proof of Theorem 4.1 and auxiliary results. Proofs of the auxiliary 
results in this section are provided in the Appendix. Throughout, we assume 
that all assumptions of Theorem 4.1 are satisfied and work with the matrices 
T = (7fcz), E = T _1 = < k,l < p, defined in (8) and also with the 

matrix E 1 / 2 =: (£[; ), < k,l <p. We will use both versions cr(-) and cr(-) 
to represent the variance function, depending on the context, noting that 
a(fi) = a[rj) and the notation [3, (3 for the (p n + l)-vectors defined before 
Theorem 4.1 and /?(•) for the parameter function. 

For the first step of the proof of Theorem 4.1, we adopt the usual Tay- 
lor expansion based approach for showing asymptotic normality for an es- 
timator which is defined through an estimating equation; see, for exam- 
ple, McCullagh (1983). Writing the Hessian of the quasi-likelihood as Jg = 
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Fig. 6. Logit link (dashed) and nonparametric link function (solid) obtained via the 
SPQR algorithm, with overlaid group indicators, versus level of linear predictor r\. 

ApU({3) and noting that 

n 

D T D = J2g' 2 (Vi)e {l) eW/a\ Vl ), 

we obtain 



i=i 



J P = Y,Q-W(m)e®(Yi - 9(rii))/* 2 (9(Vi))] ■ ±fflH 

i=l < % 



j=l 



D T D + R, say. 



We aim to show that the remainder term R can eventually be neglected. By 
a Taylor expansion, for a (3 between (3 and (3, 

U{(3) = U{(3) - Jp0 -f3) = -J~ p {(3 - (3) 

= — [D T D{(3 -(3) + {J~ p - J P )0 -(3) + {Jp - D T D)0 - (3)}. 
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Denoting the q x q identity matrix by I q , this leads to 
V^CS -0) = V^(D T D + (Js - J p ) + (J p - D T D)y 1 U((3) 



'P 

fD T Dy 1 fJp-J f 3\ + (D T D\- X (Jp-D T D 



V n 



n 



n 



D T D 



v 



U(P) 



11 



Using the matrix norm 1 1 iV/ 1 1 2 = (J2 m ki) 1 ^ 2 1 we fi n d ( see Appendix for the 
proof) the following: 

Lemma 7.1. Asn — > oo, 



D T D\- 1 U(p) 



n 



o P (l) 



The asymptotically prevailing term is seen to be 

■D T D\- 1 U(p) 



V^(P-P) 



n 



n 



corresponding to 



Z<n 



D T D\- 1 D T V~ 1 / 2 (Y-i^) 



D T D\- 1 D T V~ l ' 2 e /D T D\- 1 D T e' 



n 



n 



n 



v 



Of interest is then the asymptotic distribution of Z^YZ n . Defining (p+ 1)- 
vectors X n and (p + 1) x (p + l)-matrices by 



(21) 



*1 1 n I I 1 n i 



7) 



we may decompose this into three terms, 
(22) Z^TZ n = X^ 2 n X n 

= X^X n + 2X^(^f n — I Pn+ i)X n 

(24) = F n + G n + H n , say. 



(23) 



The following lemma is instrumental, as it implies that in deriving the limit 
distribution, G n and H n are asymptotically negligible as compared to F n . 
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Lemma 7.2. Under the conditions 
(M3') p n = o(n 1 / 3 ), 

(M4') Ell.^o^k^ek^^^k^ksk, =o{n/pl), 
we have that 

\\^n~ Ip n +l\\l = Op(l/p n ). 

Note that conditions (M3') and (M4') are weaker than the corresponding 
conditions (M2) and (M3) and, therefore, will be satisfied under the basic 
assumptions. A consequence of Lemma 7.2 is 

= Op(Pn)o p (l/V^) = O p (y/p^). 

Therefore, G n j ^fp~^ -A 0. The bound for the term H n is completely anal- 
ogous. Since we will show in Proposition 7.1 below that (X^X n — (p n + 

1))/V2p^4- iV(0,l) [this implies \X n X^\ = O p (p n )], it follows that G n + 
H n = o p (F n ) so that these terms can indeed be neglected. The proof of The- 
orem 4.1 will therefore be complete if we show the following: 

Proposition 7.1. Asn^oo, (X^X n - { Pn + l))/^2p~^S N(0, 1). 
For the proof of Proposition 7.1, we make use of 

— l/2r-iT f / Tl Pn l{ \ \P 

X n = ^J- = ( ^^iHe^e'J^ 

and 



Pn 



k=0 



to obtain 



vTv f / / Sf{nn) 9'(Vu 2 ) (ui) (ua)Al/2) ,-(1/2) 



1 " 12 I \ 

n ^=i t 1; t 2 =o a W 

+ n 2^ e v^v^~, }J7- ) 2^ £ ti £ t 2 Stita 

"^^=1 ^ijmj °a^ 2 ; tl)t2=0 

n j say . 
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We will analyze these terms in turn and utilize the independence of the 
random variables associated with observations (Xi,Yi) for different values 
of i, the independence of the e\ of all e's, and E(e') = 0, E(e' 2 ) = 1. 



Lemma 7.3. For A n , it holds that 



A n - (p n + 1) 



/Pn 



Turning now to the second term B n , we show that it is asymptotically 
normal. Defining the r.v.s 

w _ vv j 9'{vk)g'(vj) (k) (j) 



we may write 



2 

lb. 1 



A key result is now the following: 



Lemma 7.4. T/ie random variables {W n j, 1 < j 1 < n, n G N} /orm a iri- 
angular array of martingale difference sequences w.r.t. the filtrations (J~ n j) — 

ff(4 i) J e i ,l<*<i.0<t<p n )(l<i<n,n€N). 

Note that JT n j c J- n +ij- Lemma 7.4 implies that the r.v.s W n j = n ^ 2p 
also form a triangular array of martingale difference sequences. According to 
the central limit theorem for martingale difference sequences [Brown (1971); 
see also Hall and Heyde (1980), Theorem 3.2 and corollaries], sufficient con- 
ditions for the asymptotic normality Y^j=i W n j — ► N(0, 1) are the conditional 
normalization condition and the conditional Lyapunov condition. The fol- 
lowing two lemmas which are proved in the Appendix demonstrate that 
these sufficient conditions are satisfied. We note that martingale methods 
have also been used by Ghorai (1980) for the asymptotic distribution of an 
error measure for orthogonal series density estimates. 



Lemma 7.5 (Conditional normalization condition). 

n 

^EG^I^x^l, rwoo. 

3=1 
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Lemma 7.6 (Conditional Lyapunov condition). 

n 



A consequence of Lemmas 7.5 and 7.6 is then B n J \/2p n — > iV(0, 1). To- 
gether with Lemma 7.4, this implies Proposition 7.1 and, thus, Theorem 4.1. 



APPENDIX 

We provide here the main arguments of the proofs of several corollaries 
and of the auxiliary results which were used in Section 7 for the proof of 
Theorem 4.1. 



Proof OF COROLLARY 4.2. Extending the arguments used in the proofs 
of Theorems 1 and 2 in Chiou and Miiller (1998), we find for these nonpara- 
metric function estimates under (Rl) that 



sup 

t 



g' 2 (t) g' 2 (t) 



d*(t) 

Define the matrix 
^ 1 



O r , 



l0gn +/ l 2 + . 



/Pn, 



11 



h 3 



h 2 



(DD T ) = (7, 



n 



kl)l<k,l<p n i 



lk,l 



n 

E 

i=l 



<^ 2 {m) 



According to (21) and (22), the result (15) remains the same when replacing 
r by T. From (R2) and observing the boundedness of g' 2 /cr 2 below and 
above, we obtain = Jki(l + o p (l)), where the o p -term is uniform in k,l 
and p n . The result follows by observing that the semiparametric estimate 
(3 has the same asymptotic behavior as the parametric estimate, except for 
some minor modifications due to the identifiability constraint. □ 



Proof of Corollary 4.3. The asymptotic (1 - a) confidence ellip- 
soid for G W +1 is 0-/3) T (T/c(a))0-/3) < 1. Expressing the vectors (3, (3 
in terms of the eigenvectors e& leads to the coefficients /3| = J2i e kiPi, PI = 
Y^iZkiPh and with 7^ = 01 - 131) / \J 'c(a) / X k , u* k (t) = u k (t) V 'c(a) / '\ k the 
confidence ellipsoid corresponds to the sphere Sfc7fc 2 < 1- To obtain the 
confidence band, we need to maximize \J2k0 k ~ PD^kityl = lSfc7fc w fc(*)l 
w.r.t. 7^, and subject to Y,kl* k 2 - 1- B Y Cauchy-Schwarz, |Sfe7fe w fe(*)l < 
Efc tt, fc(^) 2 ] 1/ ' 2 an d the maximizing ^ must be linear dependent with the 
vector u>l(t) , . . . , uj* +1 (t) , so that the Cauchy-Schwarz inequality becomes 
an equality. The result then follows from the definition of the i^l(t). □ 
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Proof of Lemma 7.1. We observe 



E 



Jp - D T D 



n 



= o 



K 
n 



0. 



since ||ff^(OII < c < oo,u = 1,2, cr' 2 {-) < c < 00 and <r 2 (-) >5>0 according 
to (Ml). 

Together with p n = o(n 1//4 ) (M2), this implies 



D T D\~ 1 (J p -D T D\(D T D\- l U{f3) 



n 



it 



n 



Similarly, 



D T D\~ 1 fJp ~ J/3\ fD T D\- 1 U(!3) 



n 



n 



n 



= o p (l). 



whence the result follows. □ 

Proof of Lemma 7.2. Note that 

ll*n " W1II2 ^ ll^nlbll^n 1 " Wl I 

We show that H^" 1 - I Pn+ i\\2 = o p (l), implying 

||*n||a < II W1II2 + ; '^I-T V 1 " 2 ,, ~ IIW1II2 = v^+T. 



1-ll^n- / Pn+ l||2 



Observe that 



-it 



1 = ~l/2^ D T D ~l/2 
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and, therefore, 
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flf V e(V2)e(i/2)£^a) > 2 ) („ 2 )_ c \\ 



Pn + 1 



+ o(l/pl), 
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due to (M3'). Hence, by (M4'), 



|*n -I Pn +l\\ 2 = Op(^%)O p [ — ) =O p (y/l/p n ). 



Pn 



□ 



Proof of Lemma 7.3. Since 



e (ao=-£ e E( e ; 2 )E ^ l£i2 y^)^ 2 =Pn+i 



using the definition of T,H = T" 1 and E(e' 2 ) = 1, and, similarly, by (M3), 

B(Al) = o(p n ) + (p n + l) 2 -^±^. 

n 

We find that < Var(74 n ) = o(p n ). This concludes the proof. □ 

Proof of Lemma 7.4. All random variables with upper index j are 
independent of J- n i—\. Hence, we obtain 

eo^-i^) =E^ t E o 4?^ 2 E(4^M4fi^-i) =o 



since 



Proof of Lemma 7.5. We note 
ECW^I^-.!) 

. , / g'Mg'M >p «i) «a)e * 



□ 



u,i 2 =l " vm/" vn2j tl ,...,U=0 



and obtain 

= {j-l){Pn + l). 
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This implies 

We are done if we can show Var(J]™=i{E(W^|jF nj _ 1 )}) — > 0. In order to 
obtain the second moments, we first note 

E{E(W^ i |^ nJ _ 1 )E(W^|J> i , fc _ 1 )} 



i-i fc-i 

: E X! E| ' 
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+ (j - - + l) 2 + 2(fc - l)\ Pn + 1), 



and then obtain 
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Pn ( 9'^W) \ 
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*l,...,t4=0 



+ ^( Pn + 1) 2 (1 + 0(1)) + ^( Pn + 1)(1 + o(l)). 

Applying (M2), we infer 

E^E{E(^.|^ nj ._ l)} ^ j=l + (l) 

and conclude that 

Var^E{E(^|^-i)}^0, 
whence the result follows. □ 



and E(E(W£ i |.F n j_i)) with (M2) and (M3) leads to 



n 

H 

■ nj 



1 

n 4 p2 



Proof of Lemma 7.6. Combining detailed calculations of E(W / "„ J |jF nj _ 1 ) 
d E(E(V 

n 

ti,...,t 8 =0 V ° W 7 

/g /4 (r?) \ 
X E \a 4 (7?) Eti£t * £t s £t s J 61*263*465*667*8 

^re* 3 e*4 e *7 e *8 ) 

*3,*4,*7,*8=0 V<7 W y J 

= o(l), 
completing the proof. □ 
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